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DNA1 180-1 

METHODS AND COMPOSITIONS FOR INFERRING 
EYE COLOR AND HAIR COLOR 

BACKGROUND OF THE INVENTION 

FIELD OF THE INVENTION 
[0001] The invention relates generally to methods of determining pigmentation traits of 
an individual, and more specifically to methods of inferring eye color or hair color of an 
individual by identifying single nucleotide polymorphisms (SNPs) associated with eye color 
or hair color, respectively, in a nucleic acid sample of the individual, and to compositions 
useful for practicing such methods. 

BACKGROUND INFORMATION 
[0002] Biotechnology has revolutionized the field of forensics. More specifically, the 
identification of polymorphic regions in human genomic DNA has provided a means to 
distinguish individuals based on the occurrence of a particular nucleotide at each of several 
positions in the genomic DNA that are known to contain polymorphisms. As such, analysis 
of DNA from an individual allows a genetic fingerprint or "bar code" to be constructed that, 
with the possible exception of identical twins, essentially is unique to one particular 
individual in the entire human population. 

[0003] In combination with DNA amplification methods, which allow a large amount of 
DNA to be prepared from a sample as small as a spot of blood or semen or a hair follicle, 
DNA analysis has become a routine tool in criminal cases as evidence that can free or, in 
some cases, convict a suspect. Indeed, criminal courts, which do not yet allow the results of 
a lie detector test into evidence, admit DNA evidence into trial. In addition, DNA extracted 
from evidence that, in some cases, has been preserved for years after the crime was 
committed, has resulted in the convictions of many people being overturned. 

[0004] Although DNA fingerprint analysis has greatly advanced the field of forensics, 
and has resulted in freedom of people, who, in some cases, were erroneously imprisoned for 
years, current DNA analysis methods are limited. In particular, DNA fingerprinting 
analysis only provides confirmatory evidence that a particular person is, or is not, the person 
from which the sample was derived. For example, while DNA in a semen sample can be 
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used to obtain a specific "bar code", it provides no information about the person that left the 
sample. Instead, the bar code can only be compared to the bar code of a suspect in the 
crime. If the bar codes match, then it can reasonably be concluded that the person likely is 
the source of the semen. However, if there is not a match, the investigation must continue. 

[0005] An effort has begun to accumulate a database of bar codes, particularly of 
convicted criminals. Such a database allows prospective use of a bar code obtained from a 
biological sample left at a crime scene; i.e., the bar code of the sample can be compared, 
using computerized methods, to the bar codes in the database and, where the sample is that 
of a person whose bar code is in the database, a match can be obtained, thus identifying the 
person as the likely source of the sample from the crime scene. While the availability of 
such a database provides a significant advance in forensic analysis, the potential of DNA 
analysis is still limited by the requirement that the database must include information 
relating to the person who left the biological sample at the crime scene, and it likely will be 
a long time, if ever, that such a database will provide information of an entire population. 
Thus, there is a need for methods that can provide prospective information about a subject 
from a nucleic acid sample of the subject. 

SUMMARY OF THE INVENTION 
[0006] The present invention provides methods of inferring the eye color of a human 
subject from a nucleic acid sample or a polypeptide sample of the subject, methods of 
inferring the hair color of a human subject from a nucleic acid sample or a polypeptide 
sample of the subject, and compositions for practicing such methods. The methods of the 
invention are based, in part, on the identification of single nucleotide polymorphisms 
(SNPs) that, alone or in combination, allow an inference to be drawn as to eye shade or eye 
color and as to hair color. As such, the methods can utilize the identification of haploid or 
diploid alleles of SNPs and or haplotypes. The compositions and methods of the invention 
are useful, for example, as forensic tools for obtaining information relating to physical 
characteristics of a potential crime victim or a perpetrator of a crime from a nucleic acid 
sample present at a crime scene, and as tools to assist in breeding domesticated animals, 
livestock, and the like to contain a pigmentation trait as desired. 
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[0007] In one embodiment, the invention relates to a method of inferring eye color of a 
human individual by determining the nucleotide occurrence of at least one SNP as set forth 
in Table 1 or Appendix Table 2 (see, also, Table 2; SEQ ID NOS:l to 35). In one aspect, 
the method comprises determining the nucleotide occurrence of at least one SNP as set forth 
in any of SEQ ID NOS: 1 to 3, 7 to 9, 1 1 to 13, 15 to 18, 20, 22 to 31, and 35. In another 
aspect, the method comprises determining the nucleotide occurrence of at least one SNP as 
set forth in Appendix Table 2. In still another aspect, the method comprises identifying at 
least two nucleotide occurrences of the SNP position, including, for example, diploid alleles 
corresponding to at least one SNP position, or a haplotype corresponding to at least two 
SNP positions. 

[0008] In another embodiment, the present invention relates to compositions useful for 
sampling a nucleic acid sample to determine a nucleotide occurrence of at least one SNP 
informative of eye color. Such compositions include, for example, oligonucleotide probes 
that selectively hybridize to a nucleic acid molecule as set forth in Table 1, Table 2, or 
Appendix Table 2, including one or the other of a nucleotide occurrence of a SNP (e.g., a 
nucleic acid molecule containing either a M G" or an "A" residue at the SNP position of SEQ 
ID NO:l (see, also, Table 3; marker 2142); or oligonucleotide primers that selectively 
hybridize to a position upstream or downstream (or both) of the nucleotide position such 
that a primer extension reaction or a nucleic acid amplification reaction can generate a 
product including the SNP position. Where the nucleotide occurrence of a SNP position is 
in a gene coding sequence, and the alternative forms of the SNP result in a change in the 
encoded amino acid, the composition for detecting the nucleotide occurrence at the SNP 
position can be an antibody that specifically binds to a polypeptide containing one or the 
other amino acid residue, but not to both such polypeptides. 

[0009] In still another embodiment, the invention relates to a method of inferring hair 
color of a human individual by determining the nucleotide occurrence of at least one SNP as 
set forth in Appendix Table 4. In one aspect, the method comprises identifying at least two 
nucleotide occurrences of the SNP position, including, for example, diploid alleles 
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corresponding to at least one SNP position, or a haplotype corresponding to at least two 
SNP positions. 

[0010] In another embodiment, the present invention relates to compositions useful for 
sampling a nucleic acid sample to determine a nucleotide occurrence of at least one SNP 
informative of hair color. Such compositions include, for example, oligonucleotide probes 
that selectively hybridize to a nucleic acid molecule as set forth in Appendix Table 4, 
including one or the other of a nucleotide occurrence of a SNP; or oligonucleotide primers 
that selectively hybridize to a position upstream or downstream (or both) of the nucleotide 
position such that a primer extension reaction or a nucleic acid amplification reaction can 
generate a product including the SNP position. Where the nucleotide occurrence of a SNP 
position is in a gene coding sequence, and the alternative forms of the SNP result in a 
change in the encoded amino acid, the composition for detecting the nucleotide occurrence 
at the SNP position can be an antibody that specifically binds to a polypeptide containing 
one or the other amino acid residue, but not to both such polypeptides. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[0011] Figure 1 shows the distribution of eye color scores determined as described in 
Example 1 . 

[0012] Figure 2 shows the distribution of eye color related SNPs along the human 
chromosomes. Dots indicate known human pigmentation genes, and dashes represent the 
most strongly associated of the selected SNPs (27 shown; see Example 1). 

[0013] Figure 3 shows the distribution of hair color scores (melanin index) determined 
as described in Example 3. 

DETAILED DESCRIPTION OF THE INVENTION 
[0014] The present invention is based, in part, an the identification of a panel of single 
nucleotide polymorphisms (SNPs) that alone, or in combinations, allow an inference to be 
drawn as to the eye color of an individual or as to the hair color of an individual from a 
nucleic acid or protein sample of the individual. As disclosed herein, many of these SNPs 
came from a pan-genome screen and are dispersed among the chromosomes (see, for 
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example, Figure 2, relating to eye color informative SNPs). As such the SNPs can be used 
individually, and in combinations, including as haploid or diploid alleles, to draw an 
inference regarding eye color. In addition, where the SNPs are present in the same gene or 
are sufficiently linked, they can be assembled into haplotypes, and haploid and/or diploid 
haplotype alleles can be used to infer eye color. 

[0015] The term "haplotype" is used herein to refer to groupings of two or more 
nucleotide SNPs that are linked. As such, the SNPs can be present in the same gene or in 
adjacent genes or in a gene and an adjacent intergenic region, or otherwise present in the 
genome such that they segregate non-randomly. The term "haplotype alleles" as used 
herein refers to a non-random combination of nucleotide occurrences of SNPs that make up 
a haplotype. 

[0016] The term "penetrant pigmentation-related haplotype alleles" refers to haplotype 
alleles whose association with eye color pigmentation and/or hair color pigmentation is 
strong enough that it can be detected using simple genetics approaches. Corresponding 
haplotypes of penetrant pigmentation-related haplotype alleles, are referred to herein as 
"penetrant pigmentation-related haplotypes." Similarly, individual nucleotide occurrences 
of SNPs are referred to herein as "penetrant pigmentation-related SNP nucleotide 
occurrences" if the association of the nucleotide occurrence with the eye color pigmentation 
trait (or hair color pigmentation trait) is strong enough on its own to be detected using 
simple genetics approaches, or if the SNP loci for the nucleotide occurrence make up part of 
a penetrant haplotype. The corresponding SNP loci are referred to herein as "penetrant 
pigmentation-related SNPs." Haplotype alleles of penetrant haplotypes are also referred to 
herein as "penetrant haplotype alleles" or "penetrant genetic features." Penetrant 
haplotypes are also referred to herein as "penetrant genetic feature SNP combinations. 

[001 7] The term "latent pigmentation-related haplotype alleles" refers to haplotype 
alleles that, in the context of one or more penetrant haplotypes, strengthen the inference of 
the genetic eye color pigmentation trait and/or the genetic hair color pigmentation trait. 
Latent pigmentation-related haplotype alleles are typically alleles whose association with 
eye color (or hair color) pigmentation is not strong enough to be detected with simple 
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genetics approaches. Latent pigmentation-related SNPs are individual SNPs that make up 
latent pigmentation-related haplotypes. 

[0018] A sample useful for practicing a method of the invention can be any biological 
sample of a subject that contains nucleic acid molecules, including portions of the gene 
sequences to be examined, or corresponding encoded polypeptides, depending on the 
particular method. As such, the sample can be a cell, tissue or organ sample, or can be a 
sample of a biological fluid such as semen, saliva, blood, and the like. A nucleic acid 
sample useful for practicing a method of the invention will depend, in part, on whether the 
SNPs to be identified are in coding regions or in non-coding regions. Thus, where at least 
one of the SNPs to be identified is in a non-coding region, the nucleic acid sample generally 
is a deoxyribonucleic acid (DNA) sample, particularly genomic DNA or an amplification 
product thereof. However, where heteronuclear ribonucleic acid (RNA), which includes 
unspliced mRNA precursor RNA molecules, is available, a cDNA or amplification product 
thereof can be used. Where the each of the SNPs is present in a coding region of the 
pigmentation gene(s), the nucleic acid sample can be DNA or RNA, or products derived 
therefrom, for example, amplification products. Furthermore, while the methods of the 
invention generally are exemplified with respect to a nucleic acid sample, it will be 
recognized that particular SNP alleles can be in coding regions of a gene and can result in 
polypeptides containing different amino acids at the positions corresponding to the SNPs 
due to non-degenerate codon changes. As such, in one aspect, the methods of the invention 
can be practiced using a sample containing polypeptides of the subject. 

[0019] Methods of the invention can be practiced with respect to human subjects and, 
therefore, can be particularly useful for forensic analysis. In a forensic application or a 
method of the invention, the human nucleic acid sample can be obtained from a crime 
scene, using well established sampling methods. Thus, the sample can be fluid sample or a 
swab sample. For example, the sample can be a swab sample, blood stain, semen stain, hair 
follicle, or other biological specimen, taken from a crime scene, or can be a soil sample 
suspected of containing biological material of a potential crime victim or perpetrator, can be 
material retrieved from under the finger nails of a potential crime victim, or the like, 
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wherein nucleic acids (or polypeptides) in the sample can be used as a basis for drawing an 
inference as to eye color (or hair color) according to a method of the invention. 

[0020] A mammalian subject that can be examined according to a method of the 
invention can be any mammalian species. In particular, the methods are applicable to 
drawing an inference as to a pigmentation trait of a human subject. The human subject can 
be from a general population of mixed ethnicity, or the human subject can be of a particular 
ethnic background or race. For example, the subject can be a Caucasian. With respect to 
non-human mammalian species, the methods of the invention are valuable in providing 
predictions of commercially valuable eye color and/or hair color phenotypes, for example, 
in breeding. 

[0021] The sequences disclosed in Table 3 provide flanking nucleotide sequences for the 
SNPs disclosed herein. These flanking sequence serve to aid in the identification of the 
precise location of the SNPs in the human genome, and serve as target gene segments useful 
for performing methods of the invention. A target polynucleotide typically includes a SNP 
locus and a segment of a corresponding gene that flanks the SNP. Primers and probes that 
selectively hybridize at or near the target polynucleotide sequence, as well as specific 
binding pair members that can specifically bind at or near the target polynucleotide 
sequence, can be designed based on the disclosed gene sequences and information provided 
herein. 

[0022] As used herein, the term "selective hybridization" or "selectively hybridize," 
refers to hybridization under moderately stringent or highly stringent conditions such that a 
nucleotide sequence preferentially associates with a selected nucleotide sequence over 
unrelated nucleotide sequences to a large enough extent to be useful in identifying a 
nucleotide occurrence of a SNP. It will be recognized that some amount of non-specific 
hybridization is unavoidable, but is acceptable provided that hybridization to a target 
nucleotide sequence is sufficiently selective such that it can be distinguished over the 
non-specific cross-hybridization, for example, at least about 2-fold more selective, generally 
at least about 3-fold more selective, usually at least about 5-fold more selective, and 
particularly at least about 10-fold more selective, as determined, for example, by an amount 
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of labeled oligonucleotide that binds to target nucleic acid molecule as compared to a 
nucleic acid molecule other than the target molecule, particularly a substantially similar 
(i.e., homologous) nucleic acid molecule other than the target nucleic acid molecule. 
Conditions that allow for selective hybridization can be determined empirically, or can be 
estimated based, for example, on the relative GCrAT content of the hybridizing 
oligonucleotide and the sequence to which it is to hybridize, the length of the hybridizing 
oligonucleotide, and the number, if any, of mismatches between the oligonucleotide and 
sequence to which it is to hybridize (see, for example, Sambrook et al., "Molecular Cloning: 
A laboratory manual (Cold Spring Harbor Laboratory Press 1989)). 

[0023] An example of progressively higher stringency conditions is as follows: 2 x 
SSC/0.1% SDS at about room temperature (hybridization conditions); 0.2 x SSC/0.1% SDS 
at about room temperature (low stringency conditions); 0.2 x SSC/0.1% SDS at about 42°C 
(moderate stringency conditions); and 0.1 x SSC at about 68°C (high stringency conditions). 
Washing can be carried out using only one of these conditions, e.g., high stringency 
conditions, or each of the conditions can be used, e.g., for 10-15 minutes each, in the order 
listed above, repeating any or all of the steps listed. However, as mentioned above, optimal 
conditions will vary, depending on the particular hybridization reaction involved, and can 
be determined empirically. 

[0024] The term "polynucleotide" is used broadly herein to mean a sequence of 
deoxyribonucleotides or ribonucleotides that are linked together by a phosphodiester bond. 
For convenience, the term "oligonucleotide" is used herein to refer to a polynucleotide that 
is used as a primer or a probe. Generally, an oligonucleotide useful as a probe or primer 
that selectively hybridizes to a selected nucleotide sequence is at least about 15 nucleotides 
in length, usually at least about 18 nucleotides, and particularly about 21 nucleotides or 
more in length. 

[0025] A polynucleotide can be RNA or can be DNA, which can be a gene or a portion 
thereof, a cDNA, a synthetic polydeoxyribonucleic acid sequence, or the like, and can be 
single stranded or double stranded, as well as a DNA/RNA hybrid. In various 
embodiments, a polynucleotide, including an oligonucleotide (e.g., a probe or a primer) can 
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contain nucleoside or nucleotide analogs, or a backbone bond other than a phosphodiester 
bond. In general, the nucleotides comprising a polynucleotide are naturally occurring 
deoxyribonucleotides, such as adenine, cytosine, guanine or thymine linked to 
2'-deoxyribose, or ribonucleotides such as adenine, cytosine, guanine or uracil linked to 
ribose. However, a polynucleotide or oligonucleotide also can contain nucleotide analogs, 
including non-naturally occurring synthetic nucleotides or modified naturally occurring 
nucleotides. Such nucleotide analogs are well known in the art and commercially available, 
as are polynucleotides containing such nucleotide analogs (Lin et a!., NucL Acids Res. 
22:5220-5234 (1994); Jellinek et aL, Biochemistry 34:1 1363-1 1372 (1995); Pagratis et ah, 
Nature Biotechnol. 15:68-73 (1997), each of which is incorporated herein by reference). 

[0026] The covalent bond linking the nucleotides of a polynucleotide generally is a 
phosphodiester bond. However, the covalent bond also can be any of numerous other 
bonds, including a thiodiester bond, a phosphorothioate bond, a peptide-like bond or any 
other bond known to those in the art as useful for linking nucleotides to produce synthetic 
polynucleotides (see, for example, Tarn et al., NucL Acids Res. 22:977-986 (1994); Ecker 
and Crooke, BioTechnology 13:351360 (1995), each of which is incorporated herein by 
reference). The incorporation of non-naturally occurring nucleotide analogs or bonds 
linking the nucleotides or analogs can be particularly useful where the polynucleotide is to 
be exposed to an environment that can contain a nucleolytic activity, including, for example, 
a tissue culture medium or upon administration to a living subject, since the modified 
polynucleotides can be less susceptible to degradation. 

[0027] A polynucleotide or oligonucleotide comprising naturally occurring nucleotides 
and phosphodiester bonds can be chemically synthesized or can be produced using 
recombinant DNA methods, using an appropriate polynucleotide as a template. In 
comparison, a polynucleotide or oligonucleotide comprising nucleotide analogs or covalent 
bonds other than phosphodiester bonds generally are chemically synthesized, although an 
enzyme such as T7 polymerase can incorporate certain types of nucleotide analogs into a 
polynucleotide and, therefore, can be used to produce such a polynucleotide recombinantly 
from an appropriate template (Jellinek et aL, supra, 1995). Thus, the term polynucleotide as 
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used herein includes naturally occurring nucleic acid molecules, which can be isolated from 
a cell, as well as synthetic molecules, which can be prepared, for example, by methods of 
chemical synthesis or by enzymatic methods such as by the polymerase chain reaction 
(PCR). 

[0028] In various embodiments, it can be useful to detectably label a polynucleotide or 
oligonucleotide. Detectable labeling of a polynucleotide or oligonucleotide is well known 
in the art. Particular non-limiting examples of detectable labels include chemiluminescent 
labels, radiolabels, enzymes, haptens, or even unique oligonucleotide sequences. 

[0029] A method of the identifying an eye color related SNP or a hair color related SNP 
also can be performed using a specific binding pair member. As used herein, the term 
"specific binding pair member" refers to a molecule that specifically binds or selectively 
hybridizes to another member of a specific binding pair. Specific binding pair member 
include, for example, probes, primers, polynucleotides, antibodies, etc. For example, a 
specific binding pair member can be a primer or a probe that selectively hybridizes to a 
target polynucleotide that includes a SNP locus, or that hybridizes to an amplification 
product generated using the target polynucleotide as a template, or can be an antibody that, 
under the appropriate conditions, selectively binds to a polypeptide containing one, but not 
the other, variant encoded by a polynucleotide comprising a particular SNP. 

[0030] Numerous methods are known in the art for determining the nucleotide 
occurrence for a particular SNP in a sample. Such methods can utilize one or more 
oligonucleotide probes or primers, including, for example, an amplification primer pair, that 
selectively hybridize to a target polynucleotide, which contains one or more pigmentation- 
related SNP positions. Oligonucleotide probes useful in practicing a method of the 
invention can include, for example, an oligonucleotide that is complementary to and spans a 
portion of the target polynucleotide, including the position of the SNP, wherein the presence 
of a specific nucleotide at the position (i.e., the SNP) is detected by the presence or absence 
of selective hybridization of the probe. Such a method can further include contacting the 
target polynucleotide and hybridized oligonucleotide with an endonuclease, and detecting 
the presence or absence of a cleavage product of the probe, depending on whether the 
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nucleotide occurrence at the SNP site is complementary to the corresponding nucleotide of 
the probe. 

[0031] An oligonucleotide ligation assay also can be used to identify a nucleotide 
occurrence at a polymorphic position, wherein a pair of probes that selectively hybridize 
upstream and adjacent to and downstream and adjacent to the site of the SNP, and wherein 
one of the probes includes a terminal nucleotide complementary to a nucleotide occurrence 
of the SNP. Where the terminal nucleotide of the probe is complementary to the nucleotide 
occurrence, selective hybridization includes the terminal nucleotide such that, in the 
presence of a ligase, the upstream and downstream oligonucleotides are ligated. As such, 
the presence or absence of a ligation product is indicative of the nucleotide occurrence at the 
SNP site. 

[0032] An oligonucleotide also can be useful as a primer, for example, for a primer 
extension reaction, wherein the product (or absence of a product) of the extension reaction 
is indicative of the nucleotide occurrence. In addition, a primer pair useful for amplifying a 
portion of the target polynucleotide including the SNP site can be useful, wherein the 
amplification product is examined to determine the nucleotide occurrence at the SNP site. 
Particularly useful methods include those that are readily adaptable to a high throughput 
format, to a multiplex format, or to both. The primer extension or amplification product can 
be detected directly or indirectly and/or can be sequenced using various methods known in 
the art. Amplification products which span a SNP loci can be sequenced using traditional 
sequence methodologies (e.g., the "dideoxy-mediated chain termination method," also 
known as the "Sanger Method"(Sanger, F., et al., J. Molec. Biol 94:441 (1975); Prober et 
al. Science 238:336-340 (1987)) and the "chemical degradation method," "also known as the 
"Maxam-Gilbert method"(Maxam, A. M., et al., Proc. Natl. Acad. Sci. (U.S.A.) 74:560 
(1977)), both references herein incorporated by reference) to determine the nucleotide 
occurrence at the SNP loci. 

[0033] Methods of the invention can identify nucleotide occurrences at SNPs using a 
"microsequencing" method. Microsequencing methods determine the identity of only a 
single nucleotide at a "predetermined" site. Such methods have particular utility in 
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determining the presence and identity of polymorphisms in a target polynucleotide. Such 
microsequencing methods, as well as other methods for determining the nucleotide 
occurrence at a SNP loci are discussed in Boyce-Jacino et al., U.S. Pat. No. 6,294,336, 
which is incorporated herein by reference. 

[0034] Microsequencing methods include the Genetic Bit Analysis method disclosed by 
Goelet, P. et al. (WO 92/1 5712, herein incorporated by reference). Additional, primer- 
guided, nucleotide incorporation procedures for assaying polymorphic sites in DNA have 
also been described (Komher et al, Nucl. Acids. Res. 17:7779-7784 (1989); Sokolov, Nucl. 
Acids Res. 18:3671 (1990); Syvanen et al., Genomics 8:684-692 (1990); Kuppuswamy et 
al., Proc. Natl. Acad. Sci. (U.S.A.) 88:1 143-1 147 (1991); Prezant et al, Hum. Mutat. 1:159- 
164 (1992); Ugozzoli et al., GATA 9:107-1 12 (1992); Nyren et al., Anal. Biochem. 
208:171-175 (1993); and Wallace, WO89/10414). These methods differ from Genetic 
Bit™ analysis in that they all rely on the incorporation of labeled deoriboxynucleotides to 
discriminate between bases at a polymorphic site. In such a format, since the signal is 
proportional to the number of deoriboxynucleotides incorporated, polymorphisms that occur 
in runs of the same nucleotide can result in signals that are proportional to the length of the 
run (Syvanen et al. Amer. J. Hum. Genet. 52:46-59 (1993)). Alternative microsequencing 
methods have been provided by Mundy, (U.S. Pat. No. 4,656,127) and Cohen et al (French 
Patent 2,650,840; PCT Appl. No. W09 1/02087) which discusses a solution-based method 
for determining the identity of the nucleotide of a polymorphic site. As in the Mundy 
method of U.S. Pat. No. 4,656,127, a primer is employed that is complementary to allelic 
sequences immediately 3 -to a polymorphic site. 

[0035] In response to the difficulties encountered in employing gel electrophoresis to 
analyze sequences, alternative methods for microsequencing have been developed. 
Macevicz (U.S. Pat. No. 5,002,867), for example, describes a method for determining 
nucleic acid sequence via hybridization with multiple mixtures of oligonucleotide probes. 
In accordance with such method, the sequence of a target polynucleotide is determined by 
permitting the target to sequentially hybridize with sets of probes having an invariant 
nucleotide at one position, and a variant nucleotides at other positions. The Macevicz 
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method determines the nucleotide sequence of the target by hybridizing the target with a set 
of probes, and then determining the number of sites that at least one member of the set is 
capable of hybridizing to the target (i.e., the number of "matches"). This procedure is 
repeated until each member of a sets of probes has been tested. Boyce-Jacino et al. (U.S. 
Pat. No. 6,294,336) provide a solid phase sequencing method for determining the sequence 
of nucleic acid molecules (either DNA or RNA) by utilizing a primer that selectively binds 
a polynucleotide target at a site wherein the SNP is the most 3' nucleotide selectively bound 
to the target. 

[0036] In one particular commercial example of a method that can be used to identify a 
nucleotide occurrence of one or more SNPs, the nucleotide occurrences of pigmentation- 
related SNPs in a sample can be determined using the SNP-IT™ method (Orchid 
Biosciences, Inc.; Princeton, NJ). In general, the SNP-IT™ method is a 3-step primer 
extension reaction. In the first step a target polynucleotide is isolated from a sample by 
hybridization to a capture primer, which provides a first level of specificity. In a second 
step the capture primer is extended from a terminating nucleotide trisphosphate at the target 
SNP site, which provides a second level of specificity. In a third step, the extended 
nucleotide trisphosphate can be detected using a variety of known formats, including: direct 
fluorescence, indirect fluorescence, an indirect colorimetric assay, mass spectrometry, 
fluorescence polarization, etc. Reactions can be processed in 384 well format in an 
automated format using a SNPstrearn™ instrument (Orchid Biosciences, Inc.). Phase 
known data can be generated by inputting phase unknown raw data from the SNPstrearn™ 
instrument into the Stephens and Donnelly's PHASE program. 

[0037] The method of identifying a nucleotide occurrence in the sample for at least one 
eye color related SNP or hair color related SNP, as discussed above, can further include 
grouping the nucleotide occurrences of the SNPs into one or more haplotype alleles 
indicative of eye color. For example, to infer eye color of a test subject, the identified 
haplotype alleles can be compared to known haplotype alleles, wherein the relationship of 
the known haplotype alleles to eye color is known. 

[0038] The following example is intended to illustrate but not limit the invention. 
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EXAMPLE 1 

IDENTIFICATION OF SNPs INDICATIVE OF EYE COLOR 
[0039] This example describes the identification of SNPs useful for inferring eye color 
from a nucleic acid sample of an individual. 

[0040] Iris colors were measured using a Cannon digital camera. Each subject peered 
into a cardboard box at one end, and the camera at the other end took the photo under a 
standardized brightness from a constant distance for each; 100 samples were collected using 
this method. Adobe Photoshop™ software was used to quantify the luminosity and the 
red/green, green/blue and red/blue wavelength reflectance ratios for the left iris; lighter eye 
colors had lower values for each of these variables. For each variable, the scores were 
scaled about the mean value. For example an eye of the average red/green value received a 
new scaled value of 1 , with those of value below the mean converted to values less than 1 
(proportional to their difference from the mean) and those greater than the mean converted 
to values greater than 1 (proportional to their difference from the mean). The scaled 
red/green, red/blue and green/blue values were summed for each eye and added together. 
This value was added to a scaled luminosity value for each eye to produce an eye color 
score for that eye. The eye color scores showed a continuous distribution (see FIG. 1). 

[0041] The lightest 2 1 (at the top of the above distribution) were selected, and pooled 
into a "Light" sample; and the darkest 21 eye color samples (at the bottom of the above 
distribution) were selected and pooled into a "Dark" sample. A GeneChip® Mapping 10K 
Array and Assay Set (Affymetrix; Santa Clara CA) was used to screen each pool. For each 
of the 10,000 SNPs on the GeneChip® array, an allele frequency was calculated for the 
Light pool and the Dark pool. The 10,000 SNPs were ranked based on the allele frequency 
differential between the two groups (Delta value), a Pearson's P value statistic, and an Odds 
Ratio statistic on the allele frequency differential between the two groups. The top 
100 SNPs based on the Odds Ratio statistic were selected, as were all others that were in the 
top 100 for Delta value and Pearson's P value (even if not in the top 1 00 based on the Odds 
ratio test) to produce a set of 130 SNPs. 
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[0042] To validate which of the 1 30 SNPs were associated with iris colors, a second 
completely separate group of 100 samples was genotyped and ranked in the same way. The 
best 60 SNPs described in PCT/US02/16789, which is incorporated herein by reference, 
also were genotyped in this same sample of 100 subjects. Of the 190 candidate SNPs, 
approximately 30 showed either a good Delta value, Pearson's P value or Odds ratio test 
statistic. The distribution of the 30 selected SNPs along the chromosomes is shown in 
FIG. 2. Table 1 shows the delta value, chromosomal position for 27 of the SNPs, and 
indicates whether the SNP is located within a known pigmentation gene or within a few 
megabases (Mb) of a known pigmentation gene. 

[0043 J Those SNPs indicated as located "in OCA2" or "in "ASIP" or "in TYRP1" in the 
above list previously were identified, and are disclosed in PCT/US02/16789; their inclusion 
in the list of Table 1 provides confirmation of their value as disclosed in PCT/US02/16789. 
The remaining SNPs are newly disclosed herein, and were identified using the Affymetrix 
chip. 

[0044] A classification model was built using the 27 SNPs listed in Table 1, whereby the 
200 subjects used to discover them were classified into Light or Dark eye color groups. 
Neural nets gave a classification accuracy of about 95% within-model, and about 
80% outside model. It is noted that neural nets generally require a much larger sample size 
for the number of variables used here. A simpler method was used to obtain a within-model 
accuracy of 97%. 

[0045] Table 2 provides a list of 35 SNPs, including 1 5 of the 27 SNPs shown in 
Table 1, and 20 additional SNPs. The designation "unknown" or "V2 -unknown" is used to 
identify SNPs that were not disclosed in PCT/US02/ 16789 (SEQ ID NOS: 1 to 3, 7 to 9, 1 1 
to 13, 16 to31 and 35; see, also, Table 3, and Appendix Table 2). The 20 additional SNPs in 
Table 2 were selected because they had interesting distributions that were helpful for 
classification analysis, but had less optimal P-values or delta values (Note: Table 1 has a 
cut-off Delta value of 0.125, whereas Table 2 includes 15 SNPs that also are in Table 1 (and 
have a Delta value greater than 0. 1 25) as well as 20 SNPs having Delta values less than 
0.125, but otherwise having an interesting distribution). For example, one of the SNPs in 
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Table 2 had an interesting distribution in that only 5 CT genotypes (the rest were CC 
genotypes; i.e. T is rare), but the T occurred in Light eyes every time. Thus, while its Delta 
value and P-value were not very good, the SNP was selected as having potential interest 
(stress potential). 

[0046] Table 3 provides sequences that flank and include the SNPs listed in Table 2. 
Correspondence can be determined with reference to the "MARKER" number. The position 
of the SNP in the sequences is indicated in bold, and the alternative nucleotide occurrence 
are shown as ALL1 (Allele 1) and ALL2 (Allele 2). The gene and SNP names also are 
included. Additional flanking sequences can be determined by using the disclosed 
sequences to search a database such as GenBank (see, e.g., the National Center for 
Biotechnology Information, on the world wide web, URL "www.ncbi.nlm.nih.gov"). Based 
on these sequences, probes and primers, including primer pairs, can be designed for 
determining the nucleotide occurrence at a SNP position. 

EXAMPLE 2 

VALIDATION OF SNPs INDICATIVE OF EYE COLOR 
[0047] This Example describes methods used to confirm that SNPs identified as being 
related to eye color are useful for inferring eye color of an individual. 

[0048] A list of the allele frequency differential estimates from a set of about 800 self- 
reported eye color samples, and in a second set of 100 samples where eye color was 
digitally classified is provided in Appendix Table 1. Some of these SNPs were found in the 
first set of 800 and confirmed in the set of 100, while others were discovered from a 
separate set of 100 digitally qualified samples and confirmed in the set of 100 shown in 
Appendix Table 1. For the ones found in the first set of 800, individual genotype (not 
pools) data is available and, therefore, the delta values (allele frequency differential) can be 
compared between light and dark groups. Most of the SNPs showed similar values between 
the two experiments (discovery of 800 and validation of 100) but, in fact, these SNPs were 
originally identified in a set of 100 self-reporteds and have been validated several times in 
subsequent sets of 100, to get to 800 total self-reporteds, before validating them once more 
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in the 100 digital samples (the first 800 SNPs are referred to as the discovery set, for 
convenience). 

[0049] Referring to Appendix Table 1 , SNPs colored RED at the top of the list are those 
determined to be useful for classification. SNPs colored orange below also are nicely 
associated, though they were determined not to be as useful for classification. The 
uncolored SNPs (shown below the orange colored SNPs) are those that were previously 
identified (Frudakis et al., Genetics 165:2071-2083, 2003, which is incorporated herein by 
reference), but that did not look interesting in the validation set of 100 are the last uncolored 
group. The delta value (allele frequency differential) was used rather than the p-value 
because the p-value depends on the sample size. A differential of 10% would be significant 
with a sample of 500 or so at the 0.05 level but not with a sample of 100. Since the interest 
is in confirming the original data, the p-value can be misleading because the sample sizes 
are unequal; the allele frequency differential is a better parameter to use. Most of the 
differentials are similar, showing good reproduction, even though the p-values for most of 
these differentials in a sample of 100 was not significant at the 0.05 level (many were 
close). The differences in delta value from the first 800 and the second 100 can be due to 
sample size effects, or because the eye colors were measured more objectively with the 
camera for the second 1 00. 

[0050] Appendix Table 2 provides the sequences including and flanking each of the 
SNPs shown in Appendix Table 1 and Appendix Table 3 (below); the SNP position 
showing the alternative nucleotides is in brackets. A numerical unique identifier is provided 
for each SNP sequence, and the gene name, if the SNP is in a gene. The unique identifier 
"rs number" is shown for SNP sequences obtained from the public dbSNP database. The 
SNP sequences have been masked using BLAST and a repeat masking program. The 
masking process replaces repetitive elements (e.g., Alu repeats, LINE repeats, and simple 
repeats such as "CT" repeats) with "N", and identifies regions that should be avoid when 
designing primers because primers to these sequences would be non-specific (the repeated 
sequences can occur at thousands of places in the genome). Nucleotides indicated by lower 
case letters also are repeated sequences, but were not masked out because doing so would 
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make it difficult to assay the SNP; modifications to the amplification conditions allow 
specific amplification of the SNPs in these sequences. Primers for detecting or identifying a 
SNP at a particular position can be prepared based on the disclosed sequences, or using 
additional flanking regions that can be identified using the exemplified sequences as probes. 

[0051] The SNP sequence listed under "highest rating" in Appendix Table 2 are those 
that were included in a set of about 40 SNPs that enabled good classification accuracy (see, 
also, Appendix Table 3, columns indicated with colored cells). Given the interactive effect 
at MC1R, these SNPs also were examined. Only one of the MC1R SNPs was interesting 
(Rl 60W). The allele frequencies are low enough that they can be genotyped and, when 
combined with other data sets, may show interactive effects. 

[0052] Appendix Table 3 provides a file of samples (rows) showing interesting loci 
(columns). A simple classification model was used, wherein a score was given depending 
on the association of the genotype with "light" or "dark" eye color, as follows: 1) strongly 
associated with light - score = -1; 2) weakly associated with light - score = -0.5; 3) weakly 
with dark - score = +0.5; and 4) strongly with dark - score = +1 .0. For all but 4 markers 
(columns BF, BG, BH, and BI), the scores are added linearly. For these four columns, 
including an OCA2 SNP and an MC1R SNP (the R160W coding variant), an interactive 
effect was identified, so a "combination" rule was imposed giving a score if the rule was 
met. 

[0053] The rules for using genotypes in the four columns appear under the data starting 
in row 109, column BI, and are as follows: 

RULE I if2168(BF)CC 

and 1867 (BH) GT or TT 
then brown ( 7/7) add +5 
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RULE II if 2168 (BF) CC and 

and 1869 (BI) GT or TT 
then brown (5/5) add +5 



RULE III 1879 (I) and 1916 (J) CC haplotype 
then brown 14/15 times 
add +5 to score 

RULE IV if2168(BF)CC 

and 2031 (BG) CC 
and 1869 (BI) GT 
then brown (1/1) 

but not necessary to impose - all were brown before - keep rules down 



RULE V if 2 1 68 (BF) CC 

and 2031 (BG) TC 

and neither 1 867 is GT or TT nor 1 869 is GT or TT 

then light (3/3) not necessary to impose. Keep rules down. 



[0054] The scores were added up in columns CN and DB (and iteratively in some 
preceding columns), and the sample was classified into the light group if the score was 
equal to or less than 0, otherwise it was classified into the dark group. As indicated by the 
color shading and values in these columns, the marker set generally was predictive of eye 
color. 

[0055J The 3 markers that were used in the context of combination rules were selected 
from a set of classification rules written using a hierarchical classification tree type of 
algorithm. At the time the rule was selected, it was not known that the OCA2 and MC 1 R 
SNPs would be useful markers; they were selected as providing the best combination rule 
involving markers. The identification of these SNPs provides an independent confirmation 
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of the interactive effect between MC1R and OCA2 identified in the first data set of n=800, 
and published by others as relating to skin color. The results suggest that the proteins 
encoded by these genes may be physically coupled in the melanosome. 

EXAMPLE 3 

IDENTIFICATION OF SNPs INDICATIVE OF HAIR COLOR 
[0056] This Example describes the identification of SNPs that are indicative of hair color 
of an individual. 

[0057] Hair color was measured using a dermaspectrometer. A reflectance reading at 
650 nM is sensitive to the concentration of melanin in a sample, and is relatively insensitive 
to the hemoglobin concentration. Alternatively, the level of reflectance at 550 nM is due to 
absorbance of light by both hemoglobin and melanin. By measuring at narrow regions 
around these two wavelengths the melanin index (M) is computed as 100 x 
log(l/(% reflectance at 650 nM)) and the erythema index (E) as 100 x log{(% reflectance at 
550 nM)/(% reflectance at 650nM)} (Diffey et al., Brit 7. Dermatol 1 1 1 :663-672, 1984, 
which is incorporated herein by reference). When the melanin index was calculated for 
1 00 individuals, a continuous distribution about the mean melanin index was observed 
(Figure 4). 

[0058] Two pools of samples were prepared - one pool containing 21 of the lightest hair 
colored individuals (low melanin index), and one pool containing 2 1 of the darkest hair 
colored individuals (high melanin index). DNA was extracted from buccal swabs of the 
individuals and genotyped using the GeneChip® Mapping 10K Array and Assay Set 
(Affymetrix; see Example 1). Odds ratios, Pearson's P values and allele frequency 
differentials between the two groups were calculated, and about 150 of the top SNPs were 
selected based on these three measurements. If a SNP was in the top 130 in terms of delta 
value (larger is better than smaller) it was selected. In addition, if a SNP was not in the top 
1 30 in terms of delta value, but was in the top 100 in terms of Pearson's P value (smaller is 
better) or Odds ratio (smaller is better), it also was selected. The SNPs, including flanking 
sequences, are listed in Appendix Table 4 (sequences were masked as described in 
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Example 2). The selected hair color informative SNPs can be validated as described in 
Example 2. 

[0059] Although the invention has been described with reference to the above example, 
it will be understood that modifications and variations are encompassed within the spirit and 
scope of the invention. Accordingly, the invention is limited only by the claims, which 
follow Tables 1 to 3. 
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TABLE 1 



Marker 


DELTA 


Position 


Pigment Gene 


2142 


0.275 


Xp11.23 




2190 


0.247619 


12q12 




2121 


0.215476 


1q21-23 




2189 


0.211905 


Xp11.23 




1879 


0.199248 


15q11.2-12 


in OCA2 (15q11.2) 


1916 


0.188095 


15q11.2-12 


inOCA2(15q11.2) 


1908 


0.183333 


15q11.2-12 


in OCA2 (15q11.2) 


2109 


0.164286 


1q25-31 




2177 


0.157895 


1q44 




2130 


0.154762 


13q12.3 




2191 


0.15 


3q23-q24 


5Mb from HPS3 (3q23-q24> 


2126 


0.141667 


6q22 




1998 


0.136905 


10q24 


] 7 MM^^iHP§i (1 0q23) 


2110 


0.136905 


14q24.3 




2147 


0.136905 


12p11.2 




1876 


0.132143 


9p23 


inTYRPI (9p23) 


2113 


0.130952 


4q28-31.1 




2201 


0.129762 


5p15.2 




1979 


0.128571 


20q11.2-q12 


in ASiP(20q11.2-q12) 


1986 


0.128571 


20q11.2-q12 


in ASfP(20q11.2-q12) 


2178 


0.128571 


1p13 




2050 


0.126566 


3q24 


in HPS3 (3q23-q24) 


2169 


0.126316 


13q31.5 | 


2Mb from ^ 


1873 


0.12619 


15q11.2-q12 


in OCA2(15q11.2) 


2168 


0.12619 


1p34.3 




2156 


0.125 


11P11.2 




2205 


0.125 


16p13.2 
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TABLE 2 



Gene 

UNKNOWN (1*) 
UNKNOWN (2) 
c UNKNOWN (3) 
OCA2 (4) 
OCA2 (5) 
OCA2 (6) 
UNKNOWN (7) 
UNKNOWN (8) 
UNKNOWN (9) 
OCA2 (10) 
UNKNOWN (11) 
UNKNOWN (12) 
UNKNOWN (13) 
OCA2(14) 
TYRP (15) 
UNKNOWN (16) 
UNKNOWN (17) 
UNKNOWN (18) 
V2-Unknown (19) 
UNKNOWN (20) 
V2-Unknown (21) 
UNKNOWN (22) 
UNKNOWN (23) 
UNKNOWN (24) 
UNKNOWN (25) 
UNKNOWN (26) 
UNKNOWN (27) 
UNKNOWN (28) 
UNKNOWN (29) 
UNKNOWN (30) 
UNKNOWN (31) 
OCA2 (32) 
OCA2 (33) 
TYRP1 (34) 
V2-Unknown (35) 



Marker 




1879 
1916 
1908 




1869 



1905 
1948 



I -- 














Delta (light/dark) 

0.275 
0.247619048 
0.21547619 
0.19924812 
0.188095238 
0.183333333 
0.024767802 
0.164285714 
0.154761905 
0.114285714 
0.15 

0.008333333 
0.082894737 
0.021929825 
0.078947368 
0.062656642 
0.136904762 
0.061403509 
0.128571429 
0.086904762 
0.128571429 
0.057894737 
0 .1 36904762 
0.055952381 
0.129761905 
0.041666667 
0.128571429 
0.051587302 
0.126190476 
0.042857143 
0.112781955 
0.04047619 
0.112573099 
0.107142857 
0.101190476 



- Sequence Identifier (SEQ ID NO:) 



BEST AVAILABLE COPY 
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What is claimed is: 

1 . A method for inferring eye color of a human subject from a nucleic acid sample 
of the subject, comprising identifying in the nucleic acid sample at least eye color related 
SNP as set forth in Table 1, Table 3, or Appendix Table 2, whereby the nucleotide 
occurrence of the SNP is associated with eye color, thereby inferring eye color of the 
subject. 

2. A composition for inferring eye color of a human subject, comprising a specific 
binding pair member that selectively binds to a polynucleotide comprising a nucleotide 
occurrence of a SNP as set forth in Table 1 , Table 3, or Appendix Table 2, or a polypeptide 
encoded thereby. 

3. A method for inferring hair color of a human subject from a nucleic acid sample 
of the subject, comprising identifying in the nucleic acid sample at least eye color related 
SNP as set forth in Appendix Table 4, whereby the nucleotide occurrence of the SNP is 
associated with hair color, thereby inferring hair color of the subject. 

4. A composition for inferring hair color of a human subject, comprising a specific 
binding pair member that selectively binds to a polynucleotide comprising a nucleotide 
occurrence of a SNP as set forth in Appendix Table 4, or a polypeptide encoded thereby. 
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METHODS AND COMPOSITIONS FOR INFERRING 
EYE COLOR AND HAIR COLOR 

ABSTRACT OF THE DISCLOSURE 

Methods for inferring eye color or hair color of an individual from a nucleic acid 
sample of the individual by detecting the nucleotide occurrence of an eye color related 
single nucleotide polymorphism (SNP) or of a hair color related SNP, respectively, are 
provided. Methods for inferring eye color or hair color of an individual from a protein 
sample of the individual by detecting an amino acid residue encoded by the nucleotide 
occurrence of an eye color related single nucleotide polymorphism (SNP) or a hair color 
related SNP, respectively, also are provided. In addition, compositions, including 
oligonucleotides and antibodies, useful for practicing such methods are provided. 
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FIGURE 2 
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